DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-Bit CNNs
113
Tr(·) represents the trace of a matrix. However, the item ∂wt
∂αt of Eq. 4.43 is undefined and
unsolvable based on the normal backpropagation process. To address this problem, we pro-
pose a decoupled optimization method as follows. In the following, we omit the superscript
·t and define ˜L as
˜L = (∂L(α, w)
∂w
)T /α,
(4.44)
which considers the coupling optimization problem as in Eq. 4.42. Note that R(·) is only
considered when backtracking. Thus, we have
∂L(α, w)
∂w
= Tr[α ˜L∂w
∂α ].
(4.45)
For simplifying the derivation, we rewrite ˜L as [˜g1, ˜ge, · · · , ˜gE], where each ˜ge is a column
vector. Assuming that wm and αi,j are independent when m ! = j, αi,j denotes a specific
element in the matrix α, we have
(∂w
∂α )m =
⎡
⎢⎢⎢⎢⎢⎣
0
...
∂wm
∂α1,m
...
0
.
.
.
0
...
∂wm
∂αe,m
...
0
.
.
.
0
...
∂wm
∂αE,m
...
0
⎤
⎥⎥⎥⎥⎥⎦
E×M
(4.46)
and with rewritten α as a column vector [α1, αe, · · · , αE]T with each αe is a row vector, we
have
α ˜L =
⎡
⎢⎢⎢⎢⎣
α1˜g1
...
α1˜ge
...
α1˜gE
.
.
.
αe˜g1
...
αe˜ge
...
αe˜gE
.
.
.
αE˜g1
...
αE˜ge
...
αE˜gE
⎤
⎥⎥⎥⎥⎦
E×E
.
(4.47)
Combing Eq. 4.46 and Eq. 4.47, the matrix in the trace item of Eq. 4.44 can be written as
α ˜L(∂w
∂α )m =
⎡
⎢⎢⎢⎢⎢⎢⎣
0
...
α1
E
e′=1 ˜ge′
∂wm
∂αe′,m
...
0
.
.
.
0
...
αe
E
e′=1 ˜ge′
∂wm
∂αe′,m
...
0
.
.
.
0
...
αE
E
e′=1 ˜ge′
∂wm
∂αe′,m
...
0
⎤
⎥⎥⎥⎥⎥⎥⎦
E×M
.
(4.48)
Thus the whole matrix α ˜L w
α is with the size of E × M × M. After the above derivation, we
compute the e-th component of the trace item in Eq. 4.44 as
Tr[α ˜L(∂w
∂α )]e = αe
M
m=1
E
e′=1
˜ge′
wm
∂αe′,m
(4.49)
Noting that in the vanilla propagation process, αt+1 = αt −η1
∂L(αt)
∂αt , thus combining
Eq. 4.49 we have
˜αt+1 = αt+1 −η
⎡
⎢⎢⎢⎢⎢⎢⎣
M
m=1
E
e′=1 ˜ge′
∂wm
∂αe′,m
.
M
m=1
E
e′=1 ˜ge′
∂wm
∂αe′,m
.
M
m=1
E
e′=1 ˜ge′
∂wm
∂αe′,m
⎤
⎥⎥⎥⎥⎥⎥⎦
⊛
⎡
⎢⎢⎢⎢⎣
α1
.
αe
.
αE
⎤
⎥⎥⎥⎥⎦
= αt+1 + ηψt ⊛αt,
(4.50)